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(54) Method of generating audio and/or video signals and apparatus therefore 



(57) An audio and/or video generation apparatus 
which is arranged in operation to generate audio and/or 
video signals representative of an audio and/or video 
source, the audio and/or video generation apparatus 
comprising a recording means which is arranged in 
operation to record the audio and/or video signals on a 
recording medium, wherein the audio and/or video gen- 
eration apparatus is arranged to receive metadata asso- 
ciated with the audio and/or video signals generated by 
a data processor, the recording means being arranged 
in operation to record the metadata on the recording 
medium with the audio and/or video signals. The data 
processor may be arranged to receive signals repre- 
sentative of the time codes of the recorded audio/video 
signals, and the metadata may include time code data 
representative of the in and out points of a take of the 
audio/video signals generated by the data processor. 
The metadata may also include unique identification 
code for identifying the audio/video signals. The unique 
identification code may be a UMID or the like. 
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Brief Description of the Drawings 

[0022] Embodiments of the present invention will now be described by way of example with reference to the accom- 
panying drawings wherein: 

5 

Figure 1 is a schematic block diagram of a video camera arranged in operative association with a Personal Digital 
Assistant (PDA), 

Figure 2 is a schematic block diagram of parts of the video camera shown In figure 1, 
Figure 3 is a pictorial representation providing an example of the form of the PDA shown in figure 1 , 
w Rgure 4 Is a schematic block diagram of a further example arrangement of parts of a video camera and some of 
the parts of the video camera associated with generating and processing metadata as a separate acquisition unit 
associated with a further example PDA, 

Figure 5 is a pictorial representation providing an example of the form of the acquisition unit shown In figure 4, 
Rgure 6 is a part schematic part pictorial representation illustrating an example of the connection between the 
is acquisition unit and the video camera of figure 4, 

Rgure 7 is a part schematic block diagram of an ingestion processor coupled to a network, part flow diagram illus- 
trating the ingestion of metadata and audio/video material items, 
Figure 8 is a pictorial representation of the ingestion processor shown in figure 7, 

Rgure 9 is a part schematic block diagram part pictorial representation of the ingestion processor shown in figures 
20 7 and 8 shown in more detail, 

Rgure 10 is a schematic block diagram showing the ingestion processor shown in operative association with the 
database of figure 7, 

Rgure 1 1 is a schematic block diagram showing a further example of the operation of the ingestion processor 
shown figure 7, 

25 Figure 1 2a is a schematic representation of the generation of picture stamps at sample times of audio/video mate- 

rial, 

Figure 12b is a schematic representation of the generation of text samples with respect to time of the audioA/ideo 
material, 

Rgure 13 provides as illustrative representation of an example structure for organising metadata, 
30 Figure 14 is a schematic block diagram illustrating the structure of a data reduced UMID, and 

Figure 15 is a schematic block diagram illustrating the structure of an extended UMID. 

Description of Preferred Embodiments 

35 Acquisition Unit 

[0023] Embodiments of the present invention relate to audio and/or video generation apparatus which may be for 
example television cameras, video cameras or camcorders. An embodiment of the present invention will now be 
described with reference to figure 1 which provides a schematic block diagram of a video camera which is arranged to 
40 communicate to a personal digital assistant (PDA). A PDA is an example of a data processor which may be arranged 
in operation to generate metadata in accordance with a user's requirements. The term personal digital assistant is 
known to those acquainted with the technical field of consumer electronics as a portable or hand held personal organ- 
iser or data processor which include an alpha numeric key pad and a hand writing interface. 

[0024] In figure 1 a video camera 101 is shown to comprise a camera body 102 which is arranged to receive light 
45 from an image source falling within a field of view of an imaging arrangement 104 which may include one or more imag- 
ing lenses (not shown). The camera also includes a view finder 106 and an operating control unit 108 from which a user 
can control the recording of signals representative of the images formed within the field of view of the camera. The cam- 
era 101 also includes a microphone 110 which may be a plurality of microphones arranged to record sound in stereo. 
Also shown in figure 1 is a hand-held PDA 112 which has a screen 114 and an alphanumeric key pad 116 which also 
so includes a portion to allow the user to write characters recognised by the PDA. The PDA 112 is arranged to be con- 
nected to the video camera 101 via an interface 118. The interface 1 1 8 is arranged in accordance with a predetermined 
standard format such as, for example an RS232 or the like. The interface 118 may also be effected using infra-red sig- 
nals, whereby the interface 1 1 8 is a wireless communications link. The interface 1 1 8 provides a facility for communicat- 
ing information with the video camera 101. The function and purpose of the PDA 112 will be explained in more detail 
55 shortly. However in general the PDA 112 provides a facility for sending and receiving metadata generated using the 
PDA 112 and which can be recorded with the audio and video signals detected and captured by the video camera 1 . A 
better understanding of the operation of the video camera 101 in combination with the PDA 1 12 may be gathered from 
figure 2 which shows a more detailed representation of the body 1 02 of the video camera which is shown in figure 1 and 
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in which common parts have the same numerical designations. 

[0025] In figure 2 the camera body 102 is shown to comprise a tape drive 122 having read/write heads 124 opera- 
tively associated with a magnetic recording tape 126. Also shown in figure 2 the camera body includes a metadata gen- 
eration processor 128 coupled to the tape drive 122 via a connecting channel 130. Also connected to the metadata 

5 generation processor 128 is a data store 132, a clock 136 and three sensors 138, 140; 142. The interface unit 118 
sends and receives data also shown in figure 2 via a wireless channel 119. Correspondingly two connecting channels 
for receiving and transmitting data respectively, connect the interface unit 1 18 to the metadata generation processor 
128 via corresponding connecting channels 148 and 150. The metadata generation processor is also shown to receive 
via a connecting channel 1 51 the audio/video signals generated by the camera. The audio/video signals are also fed to 

io the tape drive 122 to be recorded on to the tape 126. 

[0026] The video camera 110 shown in figure 1 operates to record visual information falling within the field of view 
of the lens arrangement 104 onto a recording medium. The visual information is converted by the camera into video 
signals. In combination, the visual Images are recorded as video signals with accompanying sound which Is detected 
by the microphone 101 and arranged to be recorded as audio signals on the recording medium with the video signals. 

is As shown in figure 2, the recording medium is a magnetic tape 126 which is arranged to record the audio and video 
signals onto the recording tape 126 by the read/write heads 124. The arrangement by which the video signals and the 
audio signals are recorded by the reaoTwrite heads 124 onto the magnetic tape 126 is not shown in figure 2 and will not 
be further described as this does not provide any greater illustration of the example embodiment of the present inven- 
tion. However once a user has captured visual images and recorded these images using the magnetic tape 126 as with 

20 the accompanying audio signals, metadata describing the content of the audio/video signals may be input using the 
PDA 112. As will be explained shortly this metadata can be Information that Identifies the audioMdeo signals in asso- 
ciation with a pre-planned event, such as a take*. As shown in figure 2 the interface unit 1 18 provides a facility whereby 
the metadata added by the user using the PDA 112 may be received within the camera body 102. Data signals may be 
received via the wireless channel 119 at the Interface unit 118. The interface unit 118 serves to convert these signals 

25 into a form in which they can be processed by the acquisition processor 128 which receives these data signals via the 
connecting channels 148, 150. 

[0027] Metadata is generated automatically by the metadata generation processor 128 in association with the 
audio/video signals which are received via the connecting channel 151. In the example embodiment illustrated in figure 
2, the metadata generation processor 128 operates to generate time codes with reference to the clock 136, and to write 
30 these time codes on to the tape 126 in a linear recording track provided for this purpose. The time codes are formed by 
the metadata generation processor 128 from the clock 136. Furthermore, the metadata generation processor 128 forms 
other metadata automatically such as a UMID, which identifies uniquely the audioYvideo signals. The metadata gener- 
ation processor may operate in combination with the tape driver 124, to write the UMID on to the tape with the 
audio/video signals. 

35 [0028] In an alternative embodiment, the UMID, as well as other metadata may be stored in the data store 132 and 
communicated separately from the tape 126. In this case, a tape ID is generated by the metadata generation processor 
128 and written on to the tape 126, to identify the tape 126 from other tapes. 

[0029] In order to generate the UMID, and other metadata identifying the contents of the audio/video signals, the 
metadata generation processor 128 is arranged in operation to receive signals from other sensor 138, 140, 142, as well 

40 as the clock 1 36. The metadata generation processor therefore operates to co-ordinate these signals and provides the 
metadata generation processor with metadata such as the aperture setting of the camera lens 1 04, the shutter speed 
and a signal received via the control unit 108 to indicate that the visual images captured are a "good shot". These sig- 
nals and data are generated by the sensors 138, 140, 142 and received at the metadata generation processor 128. The 
metadata generation processor in the example embodiment is arranged to produce syntactic metadata which provides 

45 operating parameters which are used by the camera in generating the video signals. Furthermore the metadata gener- 
ation processor 128 monitors the status of the camcorder 101, and in particular whether audioA/ideo signals are being 
recorded by the tape drive 124. When RECORD START is detected the IN POINT time code is captured and a UMID 
is generated in correspondence with the IN POINT time code. Furthermore in some embodiments an extended UMID 
is generated, in which case the metadata generation processor is arranged to receive spatial co-ordinates which are 

so representative of the location at which the audio/video signals are acquired. The spatial co-ordinates may be generated 
by a receiver which operates in accordance with the Global Positioning System (GPS). The receiver may be external to 
the camera, or may be embodied within the camera body 1 02. 

[0030] When RECORD START is detected, the OUT POINT time code is captured by the metadata generation 
processor 128. As explained above, it is possible to generate a "good shot" marker. The "good shot" marker is gener- 
55 ated during the recording process, and detected by the metadata generation processor. The "good shot" marker is then 
either stored on the tape, or within the data store 132, with the corresponding IN POINT and OUT POINT time codes. 
[0031] As already indicated above, the PDA 1 12 is used to facilitate identification of the audio/video material gen- 
erated by the camera. To this end, the PDA is arranged to associate this audio/video material with pre-planned events 
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such as scenes, shots or takes. The camera and PDA shown in figures 1 and 2 form part of an integrated system for 
planning, acquiring, editing an audio/video production. During a planning phase, the scenes which are required in order 
to produce an audio/video production are identified. Furthermore for each scene a number of shots are identified which 
are required In order to establish the scene. Within each shot, a number of takes may be generated and from these 
takes a selected number may be used to form the shot for the final edit. The planning information in this form is therefore 
identified at a planning stage. Data representing or identifying each of the planned scenes and shots is therefore loaded 
into the PDA 112 along with notes which will assist the director when the audio/video material is captured. An example 
of such data is shown in the table below. 



w 



25 



A/V Production 


News story: BMW dis- 
poses of Rover 


Scene ID: 900015689 


Outside Longbhdge 


Shot 5000000199 


Longbridge BMW Sign 


Shot 5000000200 


Workers Leaving shift 


Shot 5000000201 


Workers in car park 


Scene ID: 90001 5690 


BMW HQ Munich 


Shot 5000000202 


Press conference 


Shot 5000000203 


Outside BMW building 


Scene ID: 900015691 


Interview with minister 


Shot 5000000204 


Interview 



[0032] In the first column of the table below the event which will be captured by the camera and for which 

30 audio/video material will be generated is shown. Each of the events which is defined in a hierarchy is provided with an 
identification number. Correspondingly, in the second column notes are provided in order to direct or remind the director 
of the content of the planned shot or scene. For example, in the first row the audio/video production is identified as being 
a news story, reporting the disposal of Rover by BMW. In the extract of the planning information shown in the table 
below, there are three scenes, each of which is provided with a unique identification number. Each of these scenes are 

35 "Outside Long Bridge", "BMW HQ Munich" and "Interview with Minister*. Correspondingly for each scene a number of 
shots are identified and these are shown below each of the scenes with a unique shot identification number. Notes cor- 
responding to the content of each of these shots are also entered in the second column. So, for example, for the first 
scene "Outside Long Bridge", three shots are identified which are "Long Bridge BMW", "Workers leaving shift" and 
"Workers in car park". With this information loaded onto the PDA, the director or indeed a single camera man may take 

40 the PDA out to the place where the new story is to be shot, so that the planned audio/video material can be gathered. 
An illustration of the form of the PDA with the graphical user interface displaying this information is shown in figure 3. 
[0033] As indicated in figure 1 , the PDA 1 12 is arranged to communicate data to the camera 1 1 1 . To this end the 
metadata generation processor 128 is arranged to communicate data with the PDA 1 12 via the interface 118. The inter- 
face 1 1 8 maybe for example an infra-red link 1 1 9 providing wireless communications in accordance with a known stand- 

45 ard. The PDA and the parts of the camera associated with generating metadata which are shown in figure 2 are shown 
in more detail in figure 4. 

[0034] In figure 4 the parts of the camera which are associated with generating metadata and communicating with 
the PDA 1 12 are shown in a separate acquisition unit 152. However it will be appreciated that the acquisition unit 152 
could also be embodied within the camera 102. The acquisition unit 152 comprises the metadata generation processor 

so 128, and the data store 132. The acquisition processor 152 also includes the clock 136 and the sensors 138. 140, 142 
although for clarity these are not shown in figure 4. Alternatively, some or all of these features which are shown in figure 
2 will be embodied within the camera 102 and the signals which are required to define the metadata such as the time 
codes and the audio/video signals themselves may be communicated via a communications link 153 which is coupled 
to an interface port 154. The metadata generation processor 128 is therefore provided with access to the time codes 

55 and the audio/video material as well as other parameters used in generating the audio/video material. Signals repre- 
senting the time codes end parameters as well as the audioArideo signals are received from the interface port 154 via 
the interface channel 156. The acquisition unit 152 is also provided with a screen (not shown) which is driven by a 
screen driver 158. Also shown in figure 4 the acquisition unit is provided with a communications processor 160 which 
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is coupled to the metadata generation processor 128 via a connecting channel 162. Communications is effected by the 
communications processor 160 via a radio frequency communications channel using the antennae 164. A pictorial rep- 
resentation of the acquisition unit 1 52 is shown in figure 5. 

[0035] The PDA 1 12 is also shown in figure 4. The PDA 1 12 is correspondingly provided with an infra-red commu- 
5 nications port 165 for communicating data to and from the acquisition unit 1 52 via an infra-red link 1 19. A data proces- 
sor 166 within the PDA 112 is arranged to communicate data to and from the infra-red port 165 via a connecting 
channel 1 66. The PDA 1 1 2 is also provided with a data store 1 67 and a screen driver 1 68 which are connected to the 
data processor 166. 

[0036] The pictorial representation of the PDA 1 12 shown in figure 3 and the acquisition unit shown in figure 5 pro- 

w vide an illustration of an example embodiment of the present Invention. A schematic diagram illustrating the arrange- 
ment and connection of the PDA 112 and the acquisition unit 152 is shown in figure 6. In the example shown in figure 
6 the acquisition unit 152 is mounted on the back of a camera 101 and coupled to the camera via a six pin remote con- 
nector and to a connecting channel conveying the external signal representative of the time code recorded onto the 
recording tape. Thus, the six pin remote connector and the time code indicated as arrow lines form the communications 

75 channel 153 shown in figure 4. The interface port 154 is shown in figure 6 to be a VA to DN1 conversion comprising an 
RM-P9/LTC to RS422 converter 154. RM-P9 is a camera remote control protocol, whereas LTC is Linear Time Code in 
the form of an analogue signal. This is arranged to communicate with a RS422 to RS232 converter 154" via a connect- 
ing channel which forms part of the interface port 154. The converter 154" then communicates with the metadata gen- 
eration processor 128 via the connecting channel 156 which operates in accordance with the RS 232 standard. 

20 [0037] Returning to figure 4, the PDA 1 12 which has been loaded with the pre-planned production information is 
arranged to communicate the current scene and shot for which audio/video material is to be generated by communicat- 
ing the next shot ID number via the infra-red link 119. The pre-planned information may also have been communicated 
to the acquisition unit 152 and stored in the data store 132 via a separate link or via the infra-red communication link 
119. However in effect the acquisition unit 152 is directed to generate metadata in association with the scene or shot ID 

25 number which is currently being taken. After receiving the information of the current shot the camera 102 is then oper- 
ated to make a "take of the shot". The audio/video material of the take is recorded onto the recording tape 1 26 with cor- 
responding time codes. These time codes are received along with the audio/video material via the interface port 154 at 
the metadata generation processor 128. The metadata generation processor 128 having been informed of the current 
pre-planned shot now being taken logs the time codes for each take of the shot. The metadata generation processor 

30 therefore logs the IN and OUT time codes of each take and stores these in the data store 132. 

[0038] The information generated and logged by the metadata generation processor 128 is shown in the table 
below. In the first column the scene and shot are identified with the corresponding ID numbers, and for each shot sev- 
eral takes are made by the camera operator which are indicated in a hierarchical fashion. Thus, having received infor- 
mation from the PDA 112 of the current shot, each take made by the camera operator is logged by the metadata 

35 generation processor 128 and the IN and OUT points for this take are shown in the second and third columns and, 
stored in the data store 132. This information may also be displayed on the screen of the acquisition unit 1 52 as shown 
in figure 5. Furthermore, the metadata generation processor 128 as already explained generates the UMID for each 
take for the audio/video material generated during the take. The UMID for each take forms the fourth column of the 
table. Additionally, in some embodiments, to provide a unique identification of the tape once which the material is 

40 recorded, a tape identification Is generated and associated with the metadata. The tape identification may be written on 
to the tape, or stored on a random access memory chip which is embodied within the video tape cassette body. This 
random access memory chip is known as a TELEFILE (RTM) system which provides a facility for reading the tape ID 
number remotely. The tape ID is written onto the magnetic tape 126 to uniquely identify this tape. In preferred embodi- 
ments the TELEFILE (RTM) system is provided with a unique number which manufactured as part of the memory and 

45 so can be used as the tape ID number. In other embodiments the TELEFILE (RTM) system provides automatically the 
IN/OUT time codes of the recorded audioA/ideo material items. 

[0039] In one embodiment the information shown in the table below is arranged to be recorded onto the magnetic 
tape in a separate recording channel. However, in other embodiments the metadata shown in the table is communi- 
cated separately from the tape 1 26 using either the communications processor 1 60 or the infra-red link 1 1 9. The meta- 
50 data maybe received by the PDA 112 for analysis and may be further communicated by the PDA. 
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15 



Scene ID: 900015689 


Tape ID: 00001 




UMID: 


Shot 5000000199 








Take 1 


IN: 00:03:45:29 


OUT: 00:04:21:05 


060C23B340.. 


Take 2 


IN: 00:04:21:20 


OUT: 00:04:28:15 


060C23B340.. 


Take 3 


IN: 00:04:28:20 


OUT: 00:05:44:05 


060C23B340.. 


Shot 5000000200 








Take 1 


IN: 00:05:44:10 


OUT: 00:08:22:05 


060C23B340.. 


Take 2 


IN: 00:08:22:10 


OUT: 00:08:23:05 


060C23B340.. 











[0040] The communications processor 160 may be arranged in operation to transmit the metadata generated by 
the metadata generation processor 128 via a wireless communications link. The metadata maybe received via the wire- 
20 less communications link by a remotely located studio which can then acquire the metadata and process this metadata 
ahead of the audio/video material recorded onto the magnetic tape 126. This provides an advantage in improving the 
rate at which the audio/video production may be generated during the post production phase in which the material is 
edited. 

[0041] A further advantageous feature provided by embodiments of the present invention is an arrangement in 

25 which a picture stamp is generated at certain temporal positions within the recorded audio/video signals. A picture 
stamp is known to those skilled in the art as being a digital representation of an image and in the present example 
embodiment is generated from the moving video material generated by the camera. The picture stamp may be of lower 
quality in order to reduce an amount of data required to represent the image from the video signals. Therefore the pic- 
ture stamp may be compression encoded which may result in a reduction in quality. However a picture stamp provides 

30 a visual indication of the content of the audio/video material and therefore is a valuable item of metadata. Thus, the pic- 
ture stamp may for example be generated at the IN and OUT time codes of a particular take. Thus, the picture stamps 
may be associated with the metadata generated by the metadata generation processor 1 28 and stored in the data store 
132. The picture stamps are therefore associated with items of metadata such as, for example, the time codes which 
identify the place on the tape where the image represented by the picture stamp is recorded. The picture stamps may 

35 be generated with the "Good Shot" markers. The picture stamps are generated by the metadata generation processor 
128 from the audio/video signals received via the communications link 153. The metadata generation processor there- 
fore operates to effect a data sampling and compression encoding process in order to produce the picture stamps. 
Once the picture stamps have been generated they can be used for several purposes. They may be stored in a data file 
and communicated separately from the tape 126, or they may be stored on the tape 126 in compressed form in a sep- 

40 arate recording channel. Alternatively in preferred embodiments picture stamps may be communicated using the com- 
munications processor 160 to the remotely located studio where a producer may analysis the picture stamps. This 
provides the producer with an indication as to whether the audio/video material generated by the camera operator is in 
accordance with what is required. 

[0042] In a yet further embodiment, the picture stamps are communicated to the PDA 112 and displayed on the 
45 PDA screen. This may be effected via the infra-red port 1 19 or the PDA may be provided with a further wireless link 
which can communicate with the communications processor 160. In this way a director having the hand held PDA 112 
is provided with an indication of the current audio/video content generated by the camera. This provides an immediate 
indication of the artist and aesthetic quality of the audio/video material currently being generated. As already explained 
the picture stamps are compression encoded so that they may be rapidly communicated to the PDA. 
so [0043] A further advantage of the acquisition unit 152 shown in figure 4 is that the editing process is made more 
efficient by providing the editor at a remotely located studio with an indication of the content of the audio/video material 
in advance of receiving that material. This is because the picture stamps are communication with the metadata via a 
wireless link so that the editor is provided with an indication of the content of the audio/video material in advance of 
receiving the audio/video material itself. In this way the bandwidth of the audio/video material can remain high with a 
55 correspondingly high quality whilst the metadata and picture stamps are at a relatively low band width providing rela- 
tively low quality information. As a result of the low band width the metadata and picture stamps may be communicated 
via a wireless link on a considerably lower band width channel. This facilitates rapid communication of the metadata 
describing content of the audio/video material. 
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[0044] The picture stamps generated by the metadata generation processor 128 can be at any point during the 
recorded audio/video material. In one embodiment the picture stamps are generated at the IN and OUT points of each 
take. However in other embodiments of the present invention as an activity processor 170 is arranged to detect relative 
activity within the video material. This is effected by performing a process in which a histogram of the colour compo- 

5 nents of the images represented by the video signal is compiled and the rate of change of the colour components deter- 
mined and changes in these colour components used to indicate activity within the image. Alternatively or in addition, 
motion vectors within the image are used to Indicate activity. The activity processor 176 then operates to generate a 
signal indicative of the relative activity within the video material. The metadata generation processor 1 28 then operates 
in response to the activity signal to generate picture stamps such more picture stamps are generated for greater activity 

io within the images represented by the video signals. 

[0045] In an alternative embodiment of the present invention the activity processor 1 70 is arranged to receive the 
audio signals via the connecting channel 1 72 and to recognise speech within the audio signals. The activity processor 
170 then generates content data representative of the content of this speech as text. The text data is then communi- 
cated to the data processor 128 which may be stored in the data store 132 or communicated with other metadata via 

is the communications processor 1 60 in a similar way to that already explained for the picture stamps. 

[0046] Figure 7 provides a schematic representation of a post production process in which the audio/video material 
is edited to produce an audio/video program. As shown in figure 7 the metadata, which may include picture stamps 
and/or the speech content information is communicated from the acquisition unit 152 via a separate route represented 
by a broken line 174, to a metadata database 176. The route 174 may be representative of a wireless communications 

20 link formed by for example UMTS, GSM or the like. 

[0047] The database 1 76 stores metadata to be associated with the audio/video material! The audio/video material 
in high quality form is recorded onto the tape 126. Thus the tape 126 is transported back to the editing suite where it is 
ingested by an ingestion processor 178. The tape identification (tape ID) recorded onto the tape 1 26 or other metadata 
providing an indication of the content of the audio/video material is used to associate the metadata stored in the data 

25 store 1 76 with the audio/video material on the tape as indicated by the broken line 1 80. 

[0048] As will be appreciated although the example embodiment of the present invention uses a video tape as the 
recording medium for storing the audio/video signals, it will be understood that alternative recording medium such as 
magnetic disks and random access memories may also be used. 

30 Ingestion Processor 

[0049] Figure 7 provides a schematic representation of a post production process in which the audkWideo material 
is edited to produce an audio/video program. As shown in figure 7 the metadata, which may include picture stamps 
and/or the speech content information is communicated from the acquisition unit 152 via a separate route represented 
35 by a broken line 1 74, to a metadata database 176. The route 174 may be representative of a wireless communications 
link formed by for example UMTS, GSM or the like. 

[0050] The database 1 76 stores metadata to be associated with the audio/video material. The audio/video material 
in high quality form is recorded onto the tape 126. Thus the tape 126 is transported back to the editing suite where it is 
ingested by an ingestion processor 1 78. The tape identification (tape ID) recorded onto the tape 1 26 or other metadata 
40 providing an indication of the content of the audio/video material is used to associate the metadata stored in the data 
store 1 76 with the audio/video material on the tape as indicated by the broken line 1 80. 

[0051] The ingestion processor 178 is also shown in Figure 7 to be connected to a network formed from a commu- 
nications channel represented by a connecting line 1 82. The connecting line 182 represents a communications channel 
for communicating data to items of equipment, which form an inter-connected network. To this end, these items of 
45 equipment are provided with a network card which may operate in accordance with a known access technique such as 
Ethernet, RS422 and the like. Furthermore, as will be explained shortly, the communications network 1 82 may also pro- 
vide data communications in accordance with the Serial Digital Interface (SDI) or the Serial Digital Transport Interface 
(SDTI). 

[0052] Also shown connected to the communications network 182 is the metadata database 176, and an 
so audio/video server 190, into which the audio/video material is ingested. Furthermore, editing terminals 184, 186 are 
also connected to the communications channel 182 along with a digital multi-effects processor 188. 
[0053] The communications network 182 provides access to the audio/video material present on tapes, discs or 
other recording media which are loaded into the ingestion processor 178. 

[0054] The metadata database 176 is arranged to receive metadata via the route 1 74 describing the content of the 
55 audio/video material recorded on to the recording media loaded into the ingestion processor 178. 

[0055] As will be appreciated although in the example embodiment a video tape has been used as the recording 
medium for storing the audioArideo signals, it will be understood that alternative recording media such as magnetic 
disks and random access memories may also be used, and that video tape is provided as an illustrative example only. 
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[0056] The editing terminals 184, 186 digital multi-effects processor 188 are provided with access to the 
audio/video material recorded on to the tapes loaded into the ingestion processor 1 78 and the metadata describing this 
audio/video material stored in the metadata database 176 via the communications network 182- The operation of the 
Ingestion processor with 1 78 in combination with the metadata database 1 76 will now be described in more detail. 

5 [0057] Figure 8 provides an example representation of the ingestion processor 1 78. In Figure 8 the ingestion proc- 
essor 178 is shown to have a jog shuttle control 200 for navigating through the audio/video material recorded on the 
tapes loaded into video tape recorders/reproducers forming part of the ingestion processor 1 78, The ingestion proces- 
sor 178 also Includes a display screen 202 which Is arranged to display picture stamps which describe selected parts 
of the audio/video material. The display screen 202 also acts as a touch screen providing a user with the facility for 

w selecting the audio/video material by touch. The ingestion processor 178 is also arranged to display all types of meta- 
data on the screen 202 which includes script, camera type, lens types and UMIDs. 

[0058] As shown in Figure 9, the ingestion processor 1 78 may include a plurality of video tape recorders/reproduc- 
ers into which the video tapes onto which the audio/video material Is recorded may be loaded in parallel. In the example 
shown in figure 9, the video tape recorders 204 are connected to the ingestion processor 178 via an RS422 link and an 

is SDI IN/OUT link. The ingestion processor 1 78 therefore represents a data processor which can access any of the video 
tape recorders 204 In order to reproduce the audio/video material from the video tapes loaded into the video tape 
recorders. Furthermore, the ingestion processor 178 is provided with a network card in order to access the communi- 
cations network 182. As will be appreciated from Figure 9 however, the communications channel 1 82 is comprised of a 
relatively low band width data communications channel 1 82* and a high band width SDI channel 182" for use in stream- 

20 ing video data. Correspondingly, therefore the ingestion processor 1 78 is connected to the video tape recorders 204 via 
an RS422 link In order communicate requests for corresponding items of audio/video material. Having requested these 
items of audio/video material, the audioA/ideo material is communicated back to the ingestion processor 1 78 via an SDI 
communication link 206 for distribution via the SDI network. The requests may for example include the UMID which 
uniquely identifies the audio/video material item(s). 

25 [0059] The operation of the ingestion processor in association with the metadata database 176 will now be 
explained with reference to figure 1 0. In figure 1 0 the metadata database 1 76 is shown to include a number of items of 
metadata 210 associated with a particular tape ID 212. As shown by the broken line headed arrow 214, the tape ID 212 
identifies a particular video tape 216, on which the audio/video material corresponding to the metadata 210 is recorded. 
In the example embodiment shown in Figure 10, the tape ID 212 is written onto the videotape 218 In the linear time 

30 code area 220. However ft will be appreciated that in other embodiments, the tape ID could be written in other places 
such as the vertical blanking portion. The video tape 21 6 is loaded into one of the video tape recorders 204 forming part 
of the ingestion processor 178. 

[0060] In operation one of the editing terminals 1 84 is arranged to access the metadata database 176 via the low 
band width communications channel 1 82' the editing terminal 1 84 is therefore provided with access to the metadata 21 0 
35 describing the content of the audio/video material recorded onto the tape 21 6. The metadata 21 0 may include such as 
the copyright owner "BSkyB", the resolution of the picture and the format in which the video material is encoded, the 
name of the program, which is in this case 'Grandstand*, and Information such as the date, time and audience. Meta- 
data may further include a note of the content of the audio/video material. 

[0061] Each of the items of audio/video material is associated with a UMID, which idenifies the audio/video mate- 
40 . riai. As such, the editing terminal 1 84 can be used to identify and select from the metadata 21 0 the items of audio/video 
material which are required in order to produce a program. This material may be identified by the UMID associated with 
the material. In order to access the audio/video material to produce the program, the editing terminal 184 communi- 
cates a request for this material via the low band width communications network 182. The request includes the UMID 
or the UMIDs identifying the audioA/ideo material rtem(s). In response to the request for audio/video material received 
45 from the editing terminal 184, the ingestion processor 1 78 is arranged to reproduce selectively these audio/video mate- 
rial items identified by the UMID or UMIDs from the video tape recorder into which the video cassette 216 is loaded. 
This audio/video material is then streamed via the SDI network 182" back to the editing terminal 184 to be incorporated 
into the audioA/ideo production being edited. The streamed audioMdeo material is ingested into the audio/video server 
190 from where the audioA/ideo can be stored and reproduced. 
so [0062] Figure 1 1 provides an alternative arrangement in which the metadata 21 0 is recorded onto a suitable record- 
ing medium with the audio/video material. For example the metadata 21 0 could be recorded in one of the audio tracks 
of the video tape 218*. Alternatively, the recording medium may be an optical disc or magnetic disc allowing random 
access and providing a greater capacity for storing data. In this case the metadata 210 may be stored with the 
audio/video material. 

55 [0063] In a yet further arrangement, some or all of the metadata may be recorded onto the tape 21 6. This may be 
recorded, for example, into the linear recording track of the tape 218. Some metadata related to the metadata recorded 
Onto the tape may be conveyed separately and stored in the database 1 76. A further step is required in order to ingest 
the metadata and to this end the ingestion processor 178 is arranged to read the metadata from the recording medium 
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21 8* and convey the metadata via the communications network 1 82' to the metadata database 176. Therefore, it will be 
appreciated that the metadata associated with the audio/video material to be ingested by the ingestion processor 178 
may be ingested into the database 176 via a separate medium or via the recording medium on which the audio/video 
materia! is also recorded. 

5 [0064] The metadata associated with the audio/video material may also include picture stamps which represent low 
quality representations of the images at various points throughout the video material. These may be presented at the 
touch screen 202 on the ingestion processor 178. Furthermore these picture stamps may be conveyed via the network 
182* to the editing terminals 184. 186 or the effects processor 188 to provide an indication of the content of the 
audio/video material. The editor is therefore provided with a pictorial representation for the audio/video material and 

10 from this a selection of an audio/video material items may be made. Furthermore, the picture stamp may stored in the 
database 176 as part of the metadata 21 0. The editor may therefore retreive a selected item for the corresponding pic- 
ture stamp using the U MID. which is associated with the picture stamp. 

[0065] In other embodiments of the invention, the recording medium may not have sufficient capacity to Include pic- 
ture stamps recorded with the audio/video material. This is likely to be so if the recording medium is a video tape 216. 
is It is particularly appropriate in this case, although not exclusively so, to generate picture stamps before or during inges- 
tion of the audio/video material. 

[0066] Returning to figure 7, in other embodiments, the ingestion processor 1 78 may include a pre-processing unit. 
The pre-processing unit embodied within the ingestion processor 178 is arranged to receive the audio/video material 
recorded onto the recording medium which, in the present example is a video tape 126. To this end, the pre-processing 

20 unit may be provided with a separate video recorder/reproducer or may be combined with the video tape 
recorder/reproducer which forms part of the Ingestion processor 178. The pre-processing unit generates picture stamps 
associated with the audio/video material. As explained above, the picture stamps are used to provide a pictorial repre- 
sentation of the content of the audio/video material items. However in accordance with a further embodiment of the 
present invention the pre-processing unit operates to process the audio/video material and generate an activity indica- 

25 tor representative of relative activity within the content of the audio/video material. This may be achieved for example 
using a processor which operates to generate an activity signal in accordance with a histogram of colour components 
within the images represented by the video signal and to generate the activity signals in accordance with a rate of 
change of the colour histogram components. The pre-processing unit then operates to generate a picture stamp at 
points throughout the video material where there are periods of activity indicated by the activity signal. This Is repre- 

30 sented in Figure 1 2. In Figure 1 2A picture stamps 224 are shown to be generated along a line 226 which is representing 
time within the video signal. As shown in figure 12A the picture stamps 224 are generated at times along the time line 
226 where the activity signal represented as arrows 228 indicates events of activity. This might be for example someone 
walking into and out of the field of view of the camera where there is a great deal of motion represented by the video 
signal. To this end, the activity signal may also be generated using motion vectors which may be, for example, the 

35 motion vectors generated in accordance with the MPEG standard. 

[0067] In other embodiments of the invention, the pre-processor may generate textual information corresponding to 
speech present within the audio signal forming part of the audio/video material items stored on the tape 126. The tex- 
tual information may be generated instead of the picture stamps or in addition to the picture stamps. In this case, text 
may be generated for example for the first words of sentences and/or the first activity of a speaker. This is detected from 

40 the audio signals present on the tape recording or forming part of the audio/video material. The start points where text 
is to be generated is represented along the time line 226 as arrows 230. Alternatively the text coutd be generated at the 
end of sentences or indeed at other points of interest within the speech. 

[0068] At the detected start of the speech, a speech processor operates to generate a textual representation of the 
content of the speech. To this end, the time line 226 shown in Figure 12B is shown to include the text 232 corresponding 

45 to the content of the speech at the start of activity periods of speech. 

[0069] The picture stamps and textual representation of the speech activity generated by the pre-processor is com- 
municated via the communications channel 1 82 to the metadata database 1 76 and stored. The picture stamps and text 
are stored in association with the U MID identifying the corresponding items of audio/video material from which the pic- 
ture stamps 224 and the textual information 232 were generated. This therefore provides a facility to an editor operating 

so one of the editing terminals 184, 186 to analyse the content of the audio/video material before it is ingested using the 
ingestion processor 178. As such the video tape 126 is loaded into the ingestion processor 178 and thereafter the 
audio/video material can be accessed via the network communications channel 182. The editor is therefore provided 
with an indication, very rapidly, of the content of the audio/video material and so may ingest only those parts of the 
material, which are relevant to the particular material items required by the editor. This has a particular advantage in 

55 improving the efficiency with which the editor may produce an audio/video production. 

[0070] In an alternative embodiment, the pre-processor may be a separate unit and may be provided with a screen 
on which the picture stamps and/or text information are displayed, and a means such as, for example, a touch screen, 
to provide a facility for selecting the audio/video material items to be ingested. 
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[0071] In a further embodiment of the invention, the ingestion processor 178 generates metadata items such as 
UMIDs whilst the audio/video material is being ingested. This may required because the acquisition unit in the camera 
152 is not arranged to generate UMIDs, but does generate a Unique Material Reference Number (MURN). The MURN 
is generated for each material item, such as a take. The MURN is arranged to be considerably shorter than a UMID and 
5 can therefore be accommodated within the linear time code of a video tape, which is more difficult for UMIDs because 
these are larger. Alternatively the MURN may be written into a TELEFILE (RIM) label of the tape. The MURN provides 
a unique identification of the audio/video material items present on the tape. The MURNs may be communicated sep- 
arately to the database 1 76 as indicated by the line 1 74. 

[0072] At the ingestion processor 1 78, the MURN for the material items are recovered from the tape or the TELE- 
w FILE label. For each MURN, the Ingestion processor 178 operates to generate a UMID corresponding to the MURN. 
The UMIDs are then communicated with the MURN to the database 176, and are ingested into the database in asso- 
ciation with the MURNs, which may be already present within the database 176. 
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Camera Metadata 



[0073] The following is provided, by way of example, to illustrate the possible types of metadata generated during 
the production of a programme, and one possible organisational approach to structuring that metadata. 
[0074] Figure 13 illustrates an example structure for organising metadata. A number of tables each comprising a 
number of fields containing metadata are provided. The tables may be associated with each other by way of common 

20 fields within the respective tables, thereby providing a relational structure. Also, the structure may comprise a number 
of instances of the same table to represent multiple instances of the object that the table may represent. The fields may 
be formatted in a predetermined manner. The size of the fields may also be predetermined. Example sizes include "Inf 
which represents 2 bytes, "Long Int" which represents 4 bytes and "Double" which represents 8 bytes. Alternatively, the 
size of the fields may be defined with reference to the number of characters to be held within the field such as, for exam- 

25 pie, 8, 1 0, 1 6, 32, 1 28, and 255 characters. 

[0075] Turning to the structure in more detail, there is provided a Programme Table. The Programme Table com- 
prises a number of fields including Programme ID (PID), Title, Working Title, Genre ID, Synopsis, Aspect Ratio, Director 
ID and Picturestamp. Associated with the Programme Table is a Genre Table, a Keywords Table, a Script Table, a Peo- 
ple Table, a Schedule Table and a plurality of Media Object Tables. 

30 [0076] The Genre Table comprises a number of fields including Genre ID, which is associated with the Genre ID 
field of the Programme Table, and Genre Description. 

[0077] The Keywords Table comprises a number of fields including Programme ID, which is associated with the 
Programme ID field of the Programme Table, Keyword ID and Keyword. 

[0078] The Script Table comprises a number of fields including Script ID, Script Name, Script Type, Document For- 
35 mat, Path, Creation Date, Original Author, Version, Last Modified, Modified By, PID associated with Programme ID and 
Notes. The People Table comprises a number of fields including (mage. 

[0079] The People Table is associated with a number of Individual Tables and a number of Group Tables. Each Indi- 
vidual Table comprises a number of fields including Image. Each Group Table comprises a number of fields including 
Image. Each Individual Table is associated with either a Production Staff Table or a Cast Table. 
40 [0080] The Production Staff Table comprises a number of fields including Production Staff ID, Surname, Firstname, 
Contract ID, Agent, Agency ID, E-mail, Address, Phone Number, Role ID, Notes, Allergies, DOB, National Insurance 
Number and Bank ID and Picture Stamp. 

[0081 ] The Cast Table comprises a number of fields including Cast ID, Surname, Firstname, Character Name, Con- 
tract ID, Agent, Agency ID, Equity Number, E-mail, Address, Phone Number, DOB and Bank ID and Picture Stamp. 
45 Associated with the Production Staff Table and Cast Table are a Bank Details Table and an Agency Table. 

[0082] The Bank Details Table comprises a number of fields including Bank ID, which is associated with the Bank 
ID field of the Production Staff Table and the Bank ID field of the Cast Table, Sort Code, Account Number and Account 
Name. 

[0083] The Agency Table comprises a number of fields including Agency ID, which is associated with the Agency 
so ID field of the Production Staff Table and the Agency ID field of the Cast Table, Name, Address, Phone Number, Web 
Site and E-mail and a Picture Stamp. Also associated with the Production Staff Table is a Role Table. 
[0084] The Role Table comprises a number of fields including Role ID, which is associated with the Role ID field of 
the Production Staff Table, Function and Notes and a Picture Stamp. Each Group Table is associated with an Organi- 
sation Table. 

55 [0085] The Organisation Table comprises a number fields including Organisation ID, Name, Type, Address, Con- 
tract ID. Contact Name, Contact Phone Number and Web Site and a Picture Stamp. 

[0086] Each Media Object Table comprises a number of fields including Media Object ID, Name, Description, Pic- 
turestamp, PID, Format, schedule ID, script ID and Master ID. Associated with each Media Object Table is the People 
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Table, a Master Table, a Schedule Table, a Storyboard Table, a script table and a number of Shot Tables. 
[0087] The Master Table comprises a number of fields including Master ID, which is associated with the Master ID 
field of the Media Object Table, Title, Basic UMID, EDL ID, Tape ID and Duration and a Picture Stamp. 
[0088] The Schedule Table comprises a number of fields including Schedule ID, Schedule Name, Document For- 
5 mat. Path, Creation Date, Original Author, Start Date, End Date, Version, Last Modified, Modified By and Notes and PID 
which is associated with the programme ID. 

[0089) The contract table contains: a contract ID which is associated with the contract ID of the Production staff, 
cast, and organisation tables; commencement date, rate, job title, expiry date and details. 

[0090] The Storyboard Table comprises a number of fields including Storyboard ID, which is associated with the 
70 Storyboard ID of the shot Table. Description, Author, Path and Media ID. 

[0091] Each Shot Table comprises a number of fields including Shot ID, PED, Media ID, Title, Location ID, Notes, 
Picturestamp, script ID, schedule ID, and description. Associated with each Shot Table is the People Table, the Sched- 
ule Table, script table, a Location Table and a number of Take Tables. 

[0092] The Location Table comprises a number of fields including Location ID, which is associated with the Location 
15 ID field of the Shot Table, GPS, Address, Description, Name, Cost Per Hour, Directions, Contact Name, Contact 
Address and Contact Phone Number and a Picture Stamp. 

[0093] Each Take Table comprises a number of fields Including Basic UMID. Take Number, Shot ID, Media ID, Time- 
code IN. Timecode OUT, Sign Metadata, Tape ID, Camera ID, Head Hours, Videographer, IN Stamp, OUT Stamp, Lens 
ID, AUTO ID ingest ID and Notes. Associated with each Take Table is a Tape Table, a Task Table, a Camera Table, a lens 
20 table, an ingest table and a number of Take Annotation Tables. 

[0094] The Ingest table contains an Ingest ID which is associated with the Ingest Id in the take table and a descrip- 
tion. 

[0095] The Tape Table comprises a number of fields including Tape ID, which is associated with the Tape ID field of 
the Take Table, PID, Format, Max Duration, First Usage, Max Erasures, Current Erasure, ETA ( estimated time of arrival) 
25 and Last Erasure Date and a Picture Stamp. 

[0096] The Task Table comprises a number of fields including Task ID, PID, Media ID, Shot ID, which are associated 
with the Media ID and Shot ID fields respectively of the Take Table, Title, Task Notes, Distribution List and CC List Asso- 
ciated with the Task Table is a Planned Shot Table. 

[0097] The Planned Shot Table comprises a number of fields including Planned Shot ID, PID, Media ID, Shot ID, 
30 which are associated with the PID, Media ID and Shot ID respectively of the Task Table, Director, Shot Title, Location, 

Notes, Description, Videographer, Due date, Programme title, media title Aspect Ratio and Format. 

[0098] The Camera Table comprises a number of fields including Camera ID, which is associated with the Camera 

ID field of the Take Table, Manufacturer, Model, Format, Serial Number, Head Hours, Lens ID, Notes, Contact Name, 

Contact Address and Contact Phone Number and a Picture Stamp. 
35 [0099] The Lens Table comprises a number of fields including Lens ID, which is associated with the Lens ID field of 

the Take Table, Manufacturer, Model, Serial Number, Contact Name, Contact Address and Contact Phone Number and 

a Picture Stamp. 

[0100] Each Take Annotation Table comprises a number of fields including Take Annotation ID, Basic UMID, Time- 
code, Shutter Speed, Iris, Zoom, Gamma, Shot Marker ID, Filter Wheel, Detail and Gain. Associated with each Take 
40 Annotation Table is a Shot Marker Table. 

[0101] The Shot Marker Table comprises a number of fields including Shot Marker ID, which is associated with the 
Shot Marker ID of the Take Annotation Table, and Description. 

UMID Description 

45 

[0102] A UMID is described in SMPTE Journal March 2000 which provides details of the UMID standard. Referring 
to figures 14 and 15. a basic and an extended UMID are shown. It comprises a first set of 32 bytes of basic UMID and 
a second set of 32 bytes of signature metadata. 

[0103] The first set of 32 bytes is the basic UMID. The components are: 

50 

• A 1 2-byte Universal Label to identify this as a SMPTE UMID. it defines the type of material which the UMID identi- 
fies and also defines the methods by which the globally unique Material and locally unique Instance numbers are 
created. 

• A 1 -byte length value to define the length of the remaining part of the UMID. 

55 • A 3-byte Instance number which is used to distinguish between different 'instances' of material with the same Mate- 
rial number. 

• A 16-byte Material number which is used to identify each clip. Each Material number is the same for related 
instances of the same material. 
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[0104] The second set of 32 bytes of the signature metadata as a set of packed metadata items used to create an 
extended UMID. The extended UMID comprises the basic UMID followed immediately by signature metadata which 
comprises: 

5 ♦ An 8-byte time/date code Identifying the time and date of the Content Unit creation. 

• A 12-byte value which defines the spatial co-ordinates at the time of Content Unit creation. 

• 3 groups of 4-byte codes which register the country, organisation and user codes 

[0105] Each component of the basic and extended UMIDs will now be defined in turn. 

10 

The 12-byte Universal Label 

[0106] The first 12 bytes of the UMID provide identification of the UMID by the registered string value defined In 
table 1 . 

15 



Byte No. 


Description 


Value (hex) 


t 


Object Identifier 


06h 


2 


Label size 


OCh 


3 


Designation: ISO 


2Bh 


4 


Designation: SMPTE 


34h 


5 


Registry: Dictionaries 


Olh 


6 


Registry: Metadata Dictionaries 


Olh 


7 


Standard: Dictionary Number 


Olh 


8 


Version number 


Olh 


9 


Class: Identification and location 


Olh 


10 


Sub-class: Globally Unique Identifiers 


Olh 


11 


Type: UMID (Picture, Audio, Data, Group) 


01,02,03,04h 


12 


Type: Number creation method 


XXh 



Table 1: Specification of the UMID Universal Label 

45 

[0107] The hex values in table 1 may be changed: the values given are examples. Also the bytes 1-12 may have 
designations other than those shown by way of example in the table. Referring to the Table 1, in the example shown 
byte 4 indicates that bytes 5-1 2 relate to a data format agreed by SMPTE. Byte 5 indicates that bytes 6 to 10 relate to 
so "dictionary'* data. Byte 6 indicates that such data is "metadata" defined by bytes 7 to 1 0. Byte 7 indicates the part of the 
dictionary containing metadata defined by bytes 9 and 10. Byte 10 indicates the version of the dictionary. Byte 9 indi- 
cates the class of data and Byte 10 indicates a particular item in the class. 

[0108] In the present embodiment bytes 1 to 10 have fixed pre-assigned values. Byte 1 1 is variable. Thus referring 
to Figure 15, and to Table 1 above, it will be noted that the bytes 1 to 10 of the label of the UMID are fixed. Therefore 
55 they may be replaced by a 1 byte Type' code T representing the bytes 1 to 10. The type code T is followed by a length 
code L. That is followed by 2 bytes, one of which is byte 1 1 of Table 1 and the other of which is byte 12 of Table 1 , an 
instance number (3 bytes) and a material number (16 bytes). Optionally the material number may be followed by the 
signature metadata of the extended UMID and/or other metadata. 
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[0109] The UMID type (byte 11) has 4 separate values to identify each of 4 different data types as follows: 

'01 h' = UMID for Picture material 
•02h' = UMID for Audio material 
5 '03^ = UMID for Data material 

'04h* = UMID for Group material (i.e. a combination of related essence). 

[0110] The last (12th) byte of the 12 byte label identifies the methods by which the material and instance numbers 
are created. This byte is divided into top and bottom nibbles where the top nibble defines the method of Material number 
io creation and the bottom nibble defines the method of Instance number creation. 

Length 

[0111] The Length Is a 1-byte number with the value '13h' for basic UMIDs and '33^ for extended UMIDs. 
Instance Number 



15 



[0112] The Instance number is a unique 3-byte number which is created by one of several means defined by the 
standard. It provides the link between a particular 'instance* of a clip and externally associated metadata. Without this 
20 instance number, all material could be linked to any instance of the material and its associated metadata. 

[0113] The creation of a new clip requires the creation of a new Material number together with a zero Instance 
number. Therefore, a non-zero Instance number indicates that the associated clip is not the source material. An 
Instance number is primarily used to identify. associated metadata related to any particular instance of a dip. 

25 Material Number 

[0114] The 1 6-byte Material number is a non-zero number created by one of several means identified in the stand- 
ard. The number is dependent on a 6-byte registered port ID number, time and a random number generator. 

30 Signature Metadata 

[0115] Any component from the signature metadata may be null-filled where no meaningful value can be entered. 
Any null-filled component is wholly null-filled to clearly indicate a downstream decoder that the component is not valid. 

35 The Time-Date Format 

[0116] The date-time format is 8 bytes where the first 4 bytes are a UTC (Universal Time Code) based time com- 
ponent. The time is defined either by an AES3 32-bit audio sample clock or SMPTE 12M depending on the essence 
type. 

40 [0117] The second 4 bytes define the date based on the Modified Julian Data (MJD) as defined In SMPTE 309M. 
This counts up to 999,999 days after midnight on the 17th November 1858 and allows dates to the year 4597. 

The Spatial Co-ordinate Format 

45 [0118] The spatial co-ordinate value consists of three components defined as follows: 

♦ Altitude: 8 decimal numbers specifying up to 99,999,999 metres. 

♦ Longitude: 8 decimal numbers specifying East/West 180.00000 degrees (5 decimal places active). 

♦ Latitude: 8 decimal numbers specifying North/South 90.00000 degrees (5 decimal places active). 

50 

[0119] The Altitude value is expressed as a value in metres from the centre of the earth thus allowing altitudes 
below the sea level. 

[0120] It should be noted that although spatial co-ordinates are static for most clips, this is not true for all cases. 
Material captured from a moving source such as a camera mounted on a vehicle may show changing spatial co-ordi- 
55 nate values. 
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Country Code 

[0121] The Country code is an abbreviated 4-byte atpha-numeric string according to the set defined in ISO 3166. 
Countries which are not registered can obtain a registered alpha-numeric string from the SMPTE Registration Authority. 

5 

Organisation Code 

[0122] The Organisation code Is an abbreviated 4-byte alpha-numeric string registered with SMPTE. Organisation 
codes have meaning only in relation to their registered Country code so that Organisation codes can have the same 
10 value in different countries. 

User Code 

[0123] The User code is a 4-byte alpha-numeric string assigned locally by each organisation and is not globally reg- 
is istered. User codes are defined in relation to their registered Organisation and Country codes so that User codes may 
have the same value in different organisations and countries. 

Freelance Operators 

20 [0124] Freelance operators may use their country of domicile for the country code and use the Organisation and 
User codes concatenated to e.g. an 8 byte code which can be registered with SMPTE. These freelance codes may start 
with the symbol (ISO 8859 character number 7Eh) and followed by a registered 7 digit alphanumeric string. 
[0125] As will be appreciated by those skilled in the art various modifications may be made to the embodiments 
herein before described without departing from the scope of the present invention. For example whilst embodiments 

25 have been described with recording audioA/ideo onto magnetic tape, it will be appreciated that other recording media 
are possible. Furthermore although the user generated metadata has been represented as text information, it will be 
appreciated that any other forms of metadata may be generated either automatically or under control of the user and 
received within the audio and/or video generation apparatus via an interface unit. Correspondingly the secondary meta- 
data, may be any form of semantic or syntactic metadata. 

30 [0126] As will be appreciated those features of the invention which appear in the example embodiments as a data 
processor or processing units could be implemented in hard ware as well as a software computer program running on 
an appropriate data processor. Correspondingly those aspects and features of the invention which are described as 
computer or application programs running on a data processor may be implemented as dedicated hardware. It will 
therefore be appreciated that a computer program running on a data processor which serves to form an audio and/or 

35 video generation apparatus as herein before described is an aspect of the present invention. Similarly a computer pro- 
gram recorded onto a recordable medium which serves to define the method according to the present invention or when 
loaded onto a computer forms an apparatus according to the present invention are aspects of the present invention. 

Claims 

1 . An audio and/or video generation apparatus which is arranged in operation to generate audio and/or video sig- 
nals representative of an audio and/or visual source, said audio and/or video generation apparatus comprising 

a recording means which is arranged in operation to record said audio and/or video signals on a recording 
medium, wherein 

said audio and/or video generation apparatus is arranged to receive metadata associated with said audio 
and/or video signals generated by a data processor, said recording means being arranged in operation to 
record said metadata on said recording medium with said audio and/or video signals. 

2. An audio and/or video generation apparatus as claimed in Claim 1, comprising an interface having a predeter- 
mined format for connecting said data processor to said audio and/or video generation apparatus, whereby said 
generation apparatus is arranged to receive said metadata. 

3. An audio and/or video generation apparatus as claimed in Claims 1 or 2, wherein said data processor is 
arranged to detect signals representative of the time code of the recorded audio/video signals, and said metadata 
includes time code data representative of the in and out points of a take of the audio/video signals generated by 
said data processor. 
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4. An audio and/or video generation apparatus as claimed in Claims 1, 2 or 3. wherein said metadata is a unique 
identification code for identifying the audio/video signals. 

5. An audio and/or video generation apparatus as claimed in Claim 4, wherein the unique identification code is a 
5 UMIDorthelike. 

6. An audio and/or video generation apparatus which is arranged in operation to generate audio and/or video sig- 
nals representative of an audio and/or visual source, said audio and/or video generation apparatus comprising 

to - a data processor which Is arranged in operation to detect time codes associated with said audio and/or video 
signals and to store data being representative of said time codes associated with least part of said audio/video 
signals in a data store. 

7. An audio and/or video generation apparatus as claimed in Claim 6, wherein said time code data is representative 
75 of the time codes at an in point and an out point of said at least part of the audio/video signals. 

9- An audio and/or video generation apparatus as claimed in Claim 6 or 7, wherein said metadata includes a unique 
identification code for identifying the audio/video signals. 

20 10. An audio and/or video generation apparatus as claimed in Claim 9, wherein the unique identification code is a 
UMID or the like. 

11. A metadata generation tool which is arranged in operation to receive audio and/or video signals representative 
of an audio and/or visual source, and to generate metadata associated with said audio and/or video signals, said 

25 generation apparatus comprising 

a data processor which is arranged in operation to generate said metadata in response to said audio and/or 
video signals and to store said metadata associated with at least part of said audio/video signals in a data 
store, wherein said data processor is arranged in operation to detect time codes associated with said audio 
30 and/or video signals, said generated metadata being representative of said time codes associated with least 

part of said audio/video signals. 

12. A metadata generation tool as claimed in Claim 1 1 , wherein said metadata is representative of the time codes 
at an in point and an out point of said at least part of the audio/video signals. 

35 

13. A metadata generation tool as claimed in Claim 1 1 or 12, wherein said metadata includes a unique identification 
code for identifying the audio/video signals. 

1 4. A metadata generation tool as claimed in Claim 13, wherein the unique identification code is a UMID or the like. 

40 

1 5. A metadata generation tool, wherein said audio/video signals are representative of items of audio/video mater- 
iel, and data processor operates to generate a log of said time code data for each of said items of audioA/ideo mate- 
rial. 

45 16. A metadata generation tool as claimed in Claim 15, comprising a data store wherein said data processor is 

arranged in operation to store said log in said data store. 

17. A method of generating audio/video signals comprising the steps of 

so - generating audio and/or video signals representative of an audio and/or visual source, 

recording said audio and/or video signals on a recording medium, 

generating from said audio and/or video signals metadata describing said audio and/or video signals and 
storing said metadata. 

55 1 8. A method of generating as claimed in Claim 1 7, wherein the step of storing said metadata comprises the step of 

recording said metadata on said recording medium with said audio and/or video signals. 
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19. A method of generating as claimed in Claim 1 7, wherein the step of storing said metadata comprises the step of 
storing said metadata in a data store separate to said audio and/or video signals. 

20. A method of generating as claimed in any of Claims 1 7, 1 8 or 19, wherein the step of generating said metadata 
comprises the steps of 

generating lime codes identifying a location on said recording medium where said audioA/ideo signals are 
recorded, 

detecting the time codes associated with the in and out points of part or parts of said audio/video signals, and 
forming said metadata from said detected in and out points. 

21. A computer program providing computer executable Instructions, which when loaded on to a data processor 
configures said data processor to operate as an audio and/or video generation apparatus as claimed in any of 
Claims 1 to 10, or a metadata generation tool as claimed in any of Claims 11 to 16. 

22. A computer program having computer executable instructions, which when loaded on to a data processor 
causes the data processor to perform the method according to any of Claims 17 to 20. 

23. A computer program product having a computer readable medium having recorded thereon information signals 
representative of the computer program Claimed in any of Claims 21 or 22. 
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